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A NEW SELECTION OPERATOR FOR THE DISCRETE 
EMPIRICAL INTERPOLATION METHOD IMPROVED A PRIORI 
ERROR BOUND AND EXTENSIONS 

ZLATKO DRMAC* AND SERKAN GUGERCINt 


Abstract. This paper introduces a new framework for constructing the Discrete Empirical 
Interpolation Method (DEIM) projection operator. The interpolation node selection procedure is 
formulated using the QR factorization with column pivoting, and it enjoys a sharper error bound for 
the DEIM projection error. Furthermore, for a subspace lA given as the range of an orthonormal U, 
the DEIM projection does not change if U is replaced by with arbitrary unitary matrix Q.. In a 
large-scale setting, the new approach allows modifications that use only randomly sampled rows of 
U, but with the potential of producing good approximations with corresponding probabilistic error 
bounds. Another salient feature of the new framework is that robust and efficient software imple¬ 
mentation is easily developed, based on readily available high performance linear algebra packages. 
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1. Introduction. Direct numerical simulation of dynamical systems plays a cru¬ 
cial role in studying a great variety of complex physical phenomena in areas ranging 
from neuron modeling to microchip design. The ever-increasing demand for accu¬ 
racy leads to dynamical systems of ever-larger scale and complexity. Simulation in 
such large-scale settings can make overwhelming demands on computational resources; 
thus creating a need for model reduction to create smaller, faster approximations to 
complex dynamical systems that still guarantee high fidelity. 

1.1. Model Reduction by Galerkin Projection. Consider the following non¬ 
linear dynamical system of ordinary differential equations (ODE) 


E x{t) = Ax{t) + -\-B ^{t), t > 0, 


( 1 . 1 ) 


where E,Ag 


B e 


and g : [0, oo) —M"". In (1.11, x{t) G 


is the state and g(t) is the external forcing term (input); thus (1.11 has n degrees of 
freedom and v inputs. 

Systems of the form (1.1) with very large state-space dimension (n « 0(10®) or 
higher) arise in many disciplines and typically originate from discretization of partial 
differential equation models. The goal of model reduction is to replace with a 
reduced surrogate dynamical system having much lower state space dimension, r <C n. 
The reduced model will then have the structure 


E^ Xr{t) = Ar Xr{t) + {r{Xrit)) + Bj. g(t). 


( 1 . 2 ) 


where E^, A^ G 


Br G 


and fr- 


We will use a Galerkin projection to construct the reduced model (1.2): Let Vr 
be an r-dimensional subspace spanned by the columns of V G Then, we ap¬ 

proximate the full-state x{t) using the ansatz x{t) « Vxr{t) and enforce the Galerkin 
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orthogonality condition EVxr{t) — AVxr{t) — i{Vxr{t)) — Bg{t) _L Vr to obtain the 
reduced model (1.2) with the reduced model quantities given by 


Er = V^EV, Ar = V'^AV, Br = V^B, and f,.(x^(t)) = V'^i{Vxr{t)). (1.3) 

1.2. Galerkin Projection using Proper Orthogonal Decomposition. For 

linear dynamical systems, i.e., when f = 0 in (1.1), a plethora of methods exist 


to perform model reduction: These include gramian based methods such as Balanced 
Truncation |37LI36) and Optimal Hankel Norm Approximation |24j or rational interpo¬ 
lation based methods such Iterative Rational Krylov Algorithm [57]. These methods 
rely on the concept of transfer function and perform model reduction independent 
of the input g(t). These ideas have been recently extended to systems with bilin¬ 
ear [mniEII and quadratic nonlinearities (MUD- For general nonlinearities, Proper 
Orthogonal Decomposition (POD) is the most-commonly used method. POD |35l [9] 
obtains the model reduction basis V from a truncated SVD approximation to a matrix 
of “snapshots”, a numerically computed trajectory of the full model. It is related to 
methods (and known by other names) such as Principal Component Analysis (PCA) 
in statistical analysis |55| and Karhunen-Loeve expansion in stochastic analysis. 

To construct the model reduction basis V via POD, one performs a numerical 
simulation of for an input g(<) and initial condition xq. Let xo,xi,... ^xn-i 
denote the snapshots resulting from this numerical simulation; i.e, Xi = x{ti) S M" 
for i = 0,1,..., A^ — 1. Construct the POD snapshot matrix 


= [ Xo,Xi,X2, .. . ,XAr_i] e 


1-kN 


and compute its thin SVD 


= 


(1.4) 


(1.5) 


where Z S S G and Y G with Z = Y^Y = Ijv, and 

S = diag((Ti, (72,..., tTna), with ai > a 2 > ■ ■ ■ > <Jn > 0. Then model reduction 
by POD chooses V as the leading r left singular vectors of X corresponding to the r 
largest singular values. Using MATLAB notation, this corresponds to U = Z{:, 1 : r). 
This basis selection by POD minimizes 11^* ~ ^ with 

orthonormal columns. Since the objective function does not change if $ is post- 
multiplied by an arbitrary r x r orthogonal matrix, this procedure actually seeks an 
r-dimensional subspace that optimally captures the snapshots in the least squares 
sense. For more details on POD, we refer the reader to [28l l32] . 


1.3. The lifting bottleneck. Even though the state Xrit) of the reduced model 
(1.2) lives in an r-dimensional subspace, definition of the reduced nonlinear term 


{rixr(t)) = V^{{Vxr{t)) in (1.3) requires lifting Xr{t) back to the full n-dimensional 


subspace in order to evaluate the nonlinear term; this is known as the lifting bottleneck 
and degrades the performance of reduced models for nonlinear systems. Various 
approaches exist to tackle this issue; see, e.g., lai El EESl 114] . In this paper, we 
focus on the Discrete Empirical Interpolation Method (DEIM) |39j . a discrete variant 
of the Empirical Interpolation Method introduced in jSj. 

As explained in the original source |39| . DEIM can be used to approximate and 
efficiently evaluate a general nonlinear function f, which is not necessarily tied to the 
model reduction set-up we discussed above. For example, f(r) could be a vector¬ 
valued function of possibly multidimensional parameter r. Therefore, following |39j . 
we will present the DEIM construction and our analysis for a generic nonlinear vector 
valued function f(T), yet will point out the implications for nonlinear model reduction. 
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1.4. DEIM. Given a nonlinear function f : T —K" with T and a matrix 
U € of rank to, DEIM approximation of f is defined by [351 Definition 3.1] 


f(T) = U(S^ U)-^S^f(T), 


( 1 . 6 ) 


where S is n x to matrix obtained by selecting certain columns of the n x n identity 


matrix I. With the DEIM approximation to f defined as in (1.6), the nonlinear term 


in the reduced model (1.2) is now approximated by 


(1.7) 


An effective numerical implementation of {r{xr(t)) is different than its analytical for¬ 
mula in (1.7) and allows computing {^{Xrit)) without lifting Xr{t) to the full dimension 


n and by only selecting a certain rows of VXr (t). We skip those details and refer the 
reader to [351 §3.4]. 

Computation of the DEIM basis U. In an application, the matrix U can be com¬ 
puted as follows. For a finite grid 7 h C T, the function is sampled at tj G 7h and, as 
done for state x(t) in POD for model reduction, the function values, nonlinear snap¬ 
shots, are collected in a matrix F, i.e., F = [f (ti), f(T 2 ),... ,f(T„)]. If f(r) is ni x 712 
matrix valued, the vec(-) operator is used to map its range to Then, an or¬ 

thogonal projection D = UU^, of low rank to, onto the range U = 77.(U) is constructed 
so that \\F — is minimal. Typically, m n. Therefore, U can be considered 

as the POD basis for the nonlinear snapshots. The hope is that the range of U will 
capture the values of f over the entire parameter space, i.e., l|f(T) — UU^f(r)j |2 will 
be sufficiently small at any r G T. 

The role of S, which we will call selection operator, is to strategically pick coor¬ 
dinate indices in K” at which the approximant interpolates f. (Note that S^f(T) = 
S^f(r).) The DEIM algorithm, proposed in [35], forces the selection operator S to seek 
TO linearly independent rows of U such that the local growth of the spectral norm of 
(S^U)“^ is limited via a greedy search, as implemented in Algorithm]^ This objective 
is founded in the following theoretical basis of DEIM [35] Lemma 3.2]: 

Lemma 1.1. Let U G jg orthonormal (D*U = Im, m < n) and let 

/ = U(S^U)-iS^/ (1.8) 

be the DEIM projection of an arbitrary f G K", withS computed by Algorithm^ Then 
||/-/|| 2 <cl|(I-UU*)/ll 2 , C=ll(S^U)-ll| 2 , (1.9) 


where 




Hence, we can focus on a pure matrix theoretical problemH Given orthonormal 
U G (U*U = Irn) find a row selection matrix S with jj(^U)“^jj 2 as small as 

possible. If 77.(U) captures the behavior of f well over the given parameter space, and 


if S results in a moderate value of c in (1.9), the DEIM approximation will succeed. 


The error bound (1.91 in Lemma 1.1 


is rather pessimistic and the DEIM projection 
usually performs substantially better in practice, see [39j for several illustrations of 


^From now on, we consider the problem over the complex field. 
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Algorithm 1 DEIM (Discrete Empirical Interpolation Method) [39l Algorithm 1] 
1 ; Input: ui,, Um linearly independent. 

2: Output: Selection operator S = Sm (implicitly by pm). 

3: Pi = argmaXj(|ui(i)|) ; Ui = [ui]; Si = [cpj ; pi = [pi] 

4: for j = 2 : m do 

5: Solve Sj_;^Uj_iZ = Sj_iUj for z ; 

6: Tj = Uj - Uj_iZ ; Pj = argmaXj(|rj(z)|) ; 

~ j % “ i Pj ~ (pj-iiPj) i 

8 ; end for 


superior performance of DEIM. Hence, this is an interesting theoretical question: can 
the upper bound can be improved, and what selection operator S will have a sharper a 
priori error bound, perhaps only mildly dependent on n?. 

Note that S computed in Algorithm depends on a particular basis for U; just 
reordering the basis vectors may result in different S. If, for example, U consists of 
the left singular vectors of the m dominant singular values of the data samples matrix 
F, and if some of those singular values are multiple or tightly clustered, then some 
singular vectors (columns of U) are non-unique or are numerically badly determined 
by the data and the computed U could be algorithm dependent. But the subspace they 
span is well-determined. Therefore, from both the theoretical and practical points of 
view, it is important to ask whether we can ejficiently construct S with an a priori 
assurance that c will be moderate and independent of the choice of an orthonormal 
basis U ofU. 

Our interest for studying DEIM in more detail was triggered by the above the¬ 
oretical questions from a numerical linear algebra point of view, and by a practical 
question of efficient implementation of DEIM as mathematical software on high per¬ 
formance computing machinery. In i 2.1.2[ the complexity of Algorithmis estimated 
to be 0(jnfn) + 0{m^). Unfortunately, it has unfavorable flop per memory reference 
ratio (level 2 BLAS) which precludes efficient software implementation. It would be 
advantageous to have an algorithm based on BLAS 3 building blocks, with potential 
for parallel implementations. Furthermore, we may ask whether the contribution of 
the factor n in the overall complexity can be reduced or even removed (e.g. using 
only a subset of the rows of \J) without substantial loss in the quality of the computed 
selection operator. 

Fortunately, an affirmative answer to all the questions above is surprisingly simple 
and effective: QR factorization with column pivoting of U*. Our new implementation 
of DEIM, designated as Q-DEIM, computes S independent of a particular orthonor¬ 
mal basis U, enjoys a better upper bound for the condition number c of the DEIM 
projection, and in practice computes S with usually smaller value of ||(S^U )“^||2 than 
the original DEIM algorithm. A further advantage of Q-DEIM is that it is based on 
numerically robust high performance procedures, already available in software pack¬ 
ages such as LAPACK, ScaLAPACK, MATLAB, so no additional effort is needed for 
tuning high performance DEIM. The details and a theoretical foundation of Q-DEIM 
are given in ^ In particular, in S 2.1 we provide a selection procedure and theoretical 
analysis showing that the DEIM projection is almost as good as the orthogonal pro¬ 
jection onto the range of U. Numerical experiments that illustrate the performance 
of Q-DEIM in the context of nonlinear model reduction are presented in §2.2| In ^ 
we show that accurate DEIM projection is possible even with using only a small por- 
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tion of the rows of U, and we introduce Q-DEIMr, a restricted and randomized DEIM 
selection that combines the technique used in Q-DEIM with the ideas of randomized 
sampling. Further developments and applications are outlined in ^ 


2. A new DEIM framework. A key observation leading to a selection strategy 
presented in this section is based on a solution to a similar problem in |19] . arising 
in the proof of global convergence of a block version of the Jacobi algorithm for 
diagonalization of Hermitian matrices. There, a row permutation is needed such that 
the (1,1) diagonal block of a family of 2 x 2 block partitioned unitary matrices has 
a uniform lower bound for its smallest singular value, independent of the family and 
only depending on the parameters of the partition (block dimensions). 

It is clear that the selection of well conditioned submatrices is deeply connected 
with rank revelation, and that in fact the most reliable rank revealing QR factoriza¬ 
tions are indeed based on selecting certain well conditioned submatrices, see e.g. m- 
Note, however, that in our case here, the rank is not an issue, as our matrix U is 
orthonormal. 

We adapt the strategy from m and use it in ^2.1| as a basis for introducing a 
new framework for construction of the DEIM projection; the result is a new selection 
method, called Q-DEIM, with an improved theoretical bound on c and with the selec¬ 
tion operator invariant under arbitrary changes of the orthonormal basis of the range 
of U. We also use the seminal work of Goreinov, Tyrtyshnikov and Zamarshkin 
to show that DEIM projection is not only numerically but also theoretically almost as 
good as the orthogonal projection, up to a factor of the dimension. 


2.1. Q-DEIM a new selection procedure. An answer to all practical ques¬ 
tions raised in ^is given in the following theorem. Its constructive proof is based on 
m, but we provide all the details for the reader’s convenience, and also because we 
need them in the further developments in 0 

Theorem 2.1. Let U e U*U = m < n. Then : 

• There exists an algorithm to compute a selection operator S with complexity 
0{nw?), such that 


ll(S'^U) ^\\2 < 'Jn - m + l 

and for any f £ 


y/4m _|_ 0 ^ _ 2 ^ 


11/ - U(S^U)-iS^/||2 < ^^0(2"*) 11/ - UU7II2. 


//U is only full column rank, then the bound (2.1) changes to 

||(S^U)-i| 


< 


y/n — m -b 1 V4’” -f 6m — 1 


O-min(U) 


( 2 . 1 ) 


( 2 . 2 ) 


(2.3) 


• There exists a selection operator S* such that the DEIM projection error is 
bounded by 


11/ - U(Sf U)-iSf/lb < 7l+m(n-m) |i/ - UU 7 II 2 . (2.4) 


• The selection operators S, S* do not change if U is changed to UD, where 12 
is arbitrary mxm unitary matrix, i.e., the selection of indices is assigned to 
a point on the Stiefel manifold, represented by U. 
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Proof. Let W = U* e C™^", and let 


wn = (Wi 
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0 
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* 
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(2.5) 


be a column pivoted (rank revealing) QR factorization. We have at our disposal a 
variety of pivoting strategies that reveal the numerical rank by constructing R in a way 
to control the condition numbers (explicitly or implicitly) of its leading submatrices. 

For instance, the Businger-Golub pivoting [T3] at step i first determines a smallest 
local index pi of the largest (in Euclidean norm) column in the submatrix {i : m,i : n) 
and swaps globally the columns i and pi = i—l+pi in the whole matrix. The following 
scheme illustrates the case with n = 7, m = 4, i = 2, p 2 = 3, and p 2 = 4: 
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Then, the QR step maps the f-th column in the sub-matrix (i : m,i : n) to and 
keeps all the remaining column norms in the submatrix unchanged and bounded by 
|Rii|. (Here denotes f-th canonical vector of appropriate dimension.) The product 
of all transpositions gives the permutation H. 

We define the selection operator S as the one that collects the columns of W to 
build Wi ; this implies that S^U = WJ and we need to estimate |jW]"^|j 2 . Partition 
R in (2.5) as R = (T K) with mxm upper triangular T. Then Wi = QT, and the 
problem reduces to bounding ||T“^|| 2 . As a result of the pivoting (2.6), the matrix T, 
as the leading mxm submatrix of R, has a special diagonal dominance structure: 


k 

|Tiip >'V |Tjfcp, 1 < j < fc < m; [T^^l = max |Rm^-|. (2.7) 

j—m:n 

j-i 


Further, since W = WH = QR and since WW* = U*U = QRR*Q* = we conclude 
that RR* = Irm which implies that 

n 

1 = ||R(m, :)||2 = |T^mp-k ^ \P^mj\'^ < {n - m + , (2.8) 

j—rn-\-l 


and that 


min |Tii| = | 

mm I 

i—l:m 


> 


1 

y/n — m + 1 


(2.9) 


If we set D = diag(Tii)™ T = D ^T, then |jT ^||2 < \/n — m + l||ir ^|| 2 . Further, 
if we assume U to be just of rank m, not necessarily orthonormal, then 


IT 


mm 


> 


O'min(T) > 


I|R(W, :)||2 ^ gmin(R) 

\/n — m + 1 ~ — m + 1 

O'min(U) 1 

y/n-m+1 ||t-i ||2 
6 


O'min(U) 


( 2 . 10 ) 


( 2 . 11 ) 






















Since S^U = W* = T*Q *, it follows that ||(S^U) 


-ii 


= IIT 


-ii 


12 = l/crniin(T). Hence, 
to prove (2.1) and (2.3) it remains to estimate the norm of = T“^D. This can be 


done using an analysis of Faddeev, Kublanovskaya and Faddeeva [22], that can also 
be found in [33j- Systematic use of (2.7) as in [321 Chapter 6] yields the following 
useful inequalities 


|T- 


1 


IT. 


(2*- 


;) 2—3 


4, 2, 1, 1, 0, 


0)^, z = 2, ...,TO, 


where the absolute value and the inequality between vectors are understood element¬ 
wise. For i = 1, trivially, we have T“^ei = ei(l/Tii), and T“^ei = ei. For i = 
2,... ,m we use the relations T“^ei = T“^Dei = to conclude 

|f-ie,| < (2*-2 2*-3 ..., 4, 2, 1, 1, 0, ... O)^, 


and thus (2.1), (2.3) follow by (2.9), (2.10), (2.11) and 


||t ^2 < ||t ^lli. < g(m), where g(m) 


m i — 2 




m 


+EE4^' 


\/4™ + 6m — 1 
~ 3 


If U is changed to UH with unitary H, then the column pivoted QR is computed with 
W = il*U* = H*W on input. The fact that the QR factorization of W or of W js 
implicitly the Cholesky factorization of the Hermitian semidefinite H = W*W = W*W 
extends to the pivoted factorizations as well. In the first step, obviously, looking for 
the largest diagonal entry of H in the pivoted Cholesky factorization is equivalent to 
looking for the column of W (or W) of largest Euclidean length. Hence, the pivoting 
will select the same columns in both cases. After k steps of annihilations using 
Householder reflectors with appropriate column interchanges, the intermediate result 

0 ^' j j and it is easily checked that (Wj 22 j)*Wj 22 ] equals the Schur 

complement at the corresponding step in the pivoted Cholesky factorization of H. 
Hence, the next step will have the same pivot selection in both processes. 

The existence of S* is based on an elegant argument by Goreinov et al. [33] . 
who used the concept of matrix volume (the absolute value of the determinant). The 
selection S* is defined to be the one that maximizes the volume of S^U over all 
(m) ~ mi(n-m)'. ^ ^ ™ submatrices of U. Then, by [331 Lemma 2.1], 

||(S* U )~^||2 < \/l + m{n - m). (2-12) 


Since postmultiplying U by a unitary H cannot change the volume of any m x m 
submatrix of U, the same maximizing volume submatrix will be selected. (Here we 
assume that a simple additional condition is imposed to assure unique selection in 
the case of several maxima. For instance, among multiple choices, select the one with 
smallest lexicographically ordered indices.) The error bounds (2.2) and (2.4) follow 
by inserting the corresponding bounds (2.1) and (2.12) for c into (1.9). □ 

Remark 2.2. According to [33], a slightly better bound y/ (4™“i + 2)/3 can be 
used instead of g{m). It should be emphasized that the 0(2"*) upper bound is attained 
only on a contrived example (the notorious Kahan matrix m) and in practice it can 
be replaced by 0(m). Also, in the application of DEIM, m is assumed small to modest, 
so ||T “^||2 can be estimated in 0{rn?) or even 0{m) time using a suitable condition 
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number estimator; the factor y/n — m + 1 can be replaced by the actually computed 
and potentially smaller value l/IT^ml- The deployment of an incremental condition 
estimator will be particularly important in a randomized sampling version of Q-DEIM 
introduced in 

More sophisticated rank revealing QR factorization can further reduce the upper 
bound on c, but practical experience shows that the pivoting used in Theorem |2.1| 
works very well. It has been conjectured in [5^ that (2.12) can be replaced with 
||(S^U )“^||2 < \/n, and proved that no bound smaller than ^/n can exist in general. 

Remark 2.3. While the existence and superiority of S* are clear, its construction 
is difficult. However, we can use its characterization to understand why the selection 


operator S defined in Theorem 2.1 (and the Businger-Golub pivoting in general) usu¬ 
ally works very well in practice. The volume of the submatrix selected by S equals the 
volume nr=i iTd of the upper triangular T, which is the leading m x m submatrix 
of the computed R factor. On the other hand, the pivoting, by design, at each step 
tries to produce maximal possible |Tii|; thus it can be interpreted as a greedy volume 
maximizing scheme. In fact, such an interpretation motivates post-processing to in¬ 
crease the determinant, e.g. by replacing trailing submatrix T (m — 1: m, m — 1: m) of 
T by better choices, obtained by inspecting the determinants of 2 x 2 submatrices of 
R(m — 1: m, m -b 1: n) and moving the corresponding columns upfront. 

Example 2.4. We illustrate the difference in the values of c = ||(S^U)“^|j2 
computed by DEIM and Q-DEIM using 200 randomly generated orthonormal matrices 
of size 10000 x 100. The Q-DEIM selection not only enjoys better upper bound, but 
it also in most cases provides smaller actual value of c, as seen on Figure |2.1| It is 
interesting to note that all Q-DEIM values of ||(S’^U )~^||2 are below 100, sustaining 
the conjectured bound ||(Sj'U )“^||2 < \/n for the volume maximizing scheme. On 
the other hand, DEIM breaches the ^/n upper bound in most of the trials, indicating 
less optimal selection with respect to the volume maximizing criterion. In the case 
of matrices specially constructed to exhibit large pivot growth, the value of c in both 
methods may exceed but not with a big factor. We also compare DEIM and 
Q- DEIM using a 2048 x 100 basis for a FitzHugh-Naguma system, analyzed in detail 
in S 2.2.1 The value of c for Q-DEIM is fixed at 2.6878e -I- 01 < >72048, independent 
of the orthogonal changes of the basis. 

2.1.1. Implementation details. In terms of the row selection from U, the 
actual computation used in Theorem |2.1| is an LQ factorization of U with row pivoting. 
The transposition and QR with column pivoting is used only for convenience and due 
to the availability of software implementations. A Householder based QR factorization 
of a fat mxn matrix runs with complexity 0{m^n), similar to the complexity of DEIM. 
LAPACK [T] based software tools use the optimized, BLAS 3 based and robust [20] 
function xGEQPS. Other pivoting strategies are possible, such as in xGEQPX, xGEQPY in 
m,m- On parallel computing machinery, our new approach uses the best available 
QR code with column pivoting; e.g. PxGEQPF from ScaLAPACK [T^ . 

As an illustration of the simplicity at which we get high performance computation 
of a good selection operator, and to make a case for Q-DEIM, we briefly describe a 


MATLAB implementation. Using the notation of Theorem 2.1 we have 


u = n 


.^Q*, S'^U = T*Q*, and thus M= U(S'^U)-i . (2. 


13) 


The computation T ^K = T ^(D ^K)bya triangular solver (e.g. the backslash 
\ or linsolveO in MATLAB) is numerically stable as T is well conditioned and 








Comparison of tlie constant c = 



The ratio c(Q-DEIM)/c(DEIM) 


The ratio c(Q-DEIM)/c(DEIM) 




Fig. 2.1. (Example |2.4[ ) Comparison of the value c = ||(§^U )“^||2 i'o^ DEIM and Q-DEIM. 
The first row: The comparison using 200 random orthonormal matrices of size 10000 x 100. The 
second row: 200 random changes of a DEIM orthonormal basis U of size 2048 x 100, computed 
from simulation of the FitzHugh-Naguma system, see |2.2.1[ The basis changes are obtained by 
post-multiplication by random 100 x 100 real orthogonal matrices (uniformly distributed in the Haar 
measure). 


maxij |(D^^K)y| < 1. The explicitly set identity matrix in (2.13) guarantees that 
the se lected entries of a vector / will be exactly interpolated when M is computed 
as in (2.13). If M is computed as M = computed{l}/U)) (e.g. by MATLAB’s 
slash) then S^M = {eipmxm, with all at roundoff level. If Si,..., Sm are the 
interpolation indices selected by S, then checking the interpolation for / G K" yields 

= fsi(l + eu) + ^fsjeij, i = 1,... ,to, 


revealing an undesirable pollution of fs^, in particular if maxg^^s. \fs^ \ >> |/sj. 


function [ S, M ] = q_deim( U ) ; 

% Input : U n—by—m with orthonormal columns 

% Output : S selection of m row indices with guaranteed upper bound 
% norm {inv (U (S, : ) ) ) <= sqrt (n—m+1) * 0 (2 '"m) . 

% : M the matrix U*inv{U(S,:)); 

% The Q—DEIM projection of an n—by—1 vector f is M*f(S). 

% Coded by Zlatko Drmac, April 2015. 

[n,m] = size (U) ; 

if nargout == 1 

= qr{U 'vector' ) ; S = P(l:m) ; 

else 

[Q,R,P] = qr {U vector' ) ; S = P(l:m) ; 

M = [eye(m) ; {R{:,1:m)\R{:,m+1:n)) ' ] ; 

Pinverse(P) =1 : n ; M = M(Pinverse,:) ; 

end 
end 
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2.1.2. DEIM and LU with partial pivoting. It has been known, at least to 
experts, that DEIM is a variation of Gaussian elimination; Sorensen gniisH] called 
it a pivoted LU without replacement. Recently, [5] proposed to replace Step 7 of 
Algorithmic Uj = (Uj_i uj), by Uj = (Uj_i r) (To make the distinction clear, 
we denote the new variable by U and we use Uj to denote the matrix U(:,l : j) at 
the end of the jth step.). In other words, the basis Uj_i is updated by adding the 
current residual vector. Assume that at step j, Uj_i = U_;_iGj_i, where Gj_i is 
unit upper triangular and that both computations have the same selection (this 
holds at j = 2). Then the residual r = Uj — is easily shown 

to be the same as r in Algorithm Hence, the updated Sj will be the same, and 
Uj = (Uj_l f) = (Uj_l Uj) Gj, with an updated unit upper triangular Gj. 


Note that SjUj = (Sj_i e 


Pj 




r) is lower triangular at each step: The 


upper triangular part of the last column of this product is Sj_jr, which is zero by the 
definition of r. Therefore, S^U = S^UG = Z with a lower trapezoidal Z and a unit 
upper triangular G. Hence, DEIM (with replacement) is the same as a row pivoted LU 
decomposition. This connection might help understanding why DEIM behaves much 
better t han the theoretical upper bound would suggest. Similar to the discussion in 
DEIM is applying a locally greedy search to maximize the volume of S^U: 


2.3 


Remark 

Because U = UG with a unit upper triangular G, det(S^U) = det(U) = YlJLi Zjj, and 
the Zjj’s are results of a greedy search for maxima. 


The modified update yielding U reduces the computational complexity of U(S’^U) ^ 
to O(nm^) down from 0(nm^)+ 0{m^) because the LU decomposition of S^U is no 
longer necessary. However, the claim in [2] that Algorithm [^contains O(m^) complex¬ 
ity is misleading because a practical implementation of Step 5, Sj_jUjZ = Sj_iUj, will 
not compute the LU decomposition of from scratch in each step. Instead, 

it will exploit the fact that, in the step, is changed by appending only a 

new row and a column and will update the LU decomposition with complexity 0{j^); 
thus making the total complexity of Algorithmof 0{nm'^)+ 0{m^). 


This connection also suggests using a rank revealing LU decomposition of U with 
complete pivoting and taking the indices of the first m pivoted rows as the DEIM 
indices. The bound on c changes only by a factor originating from the inverse of 
the unit upper triangular LU factor, which, as a consequence of complete pivoting, is 
bounded by 0(2”^). On the other hand, in the case of partial pivoting like in DEIM, 
the bound is rather pessimistic. However, as expected, DEIM behaves much better 
in practice, as partial LU hardly exhibits the worst case growth scenario. It is an 
interesting challenge to determine what orthonormal basis of the range of U is best 
for the performances of DEIM. 


2.2. Model Reduction Examples. In this section, we test the performance of 
the new selection procedure on two model reduction benchmark problems. 

2.2.1. The FitzHugh-Naguma (F—N) System. The F-N system, a simpli¬ 
fied version of the Hodgkin-Huxley model, arises in modeling the activation and 
deactivation dynamics of a spiking neuron. This example is borrowed from [39j and 
we follow their description of the model, including their notation and parameter se¬ 
lection. Let V and w denote, respectively, the voltage and recovery of voltage. Also, 
let X £ [0,L] and t>0. Then, the underlying dynamics are described by the coupled 
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nonlinear PDEs 


evtix, t) = e'^Vxxix, t) + /(f (cc, t)) - w{x, t) + c 
Wt(x, t) = bv{x, t) — 'yw{x, t) + c 


(2.14) 

(2.15) 


with the nonlinearity appearing as f{v) = v{v — 0.1)(1 — v) and with the initial and 
boundary conditions 


n(a;, 0)=0, ?«(a:, 0)=0, a;e[0 ,L], 

VxiO,t) =-io(t), Vx{L,t)=0, t > 0, 

where the model parameters are chosen as L = 1, e = 0.015, b = 0.5, 7 = 2, c = 0.05 
and the stimulus io{t) = 50000t^e“^®*. A finite difference discretization leads to a 
system of the form O) with system dimension n = 2048. A time-domain simula¬ 
tion with equally spaced points for t = [0, 8 ] leads to N = 100 state and nonlinear 
snapshots. Following |39j . we choose r = m = 5 and perform model reduction using 
both DEIM selection procedures. We simulate both reduced models, collect reduced- 
order snapshots Xdeim and Xq_deim- Then, to measure the error in model reduc¬ 
tion, we lift these two snapshots back to the original dimension, i.e., we compute 
I^Xdeim and EXq.deim where V is the POD basis for model reduction, and measure 
their distance (in the relative Frobenius-norm) from the original snapshot X. Let 
cdeim = and cq-deim = denote the resulting errors. For 

r = TO = 5, we obtain cdeim = 3.500673 x 10“^ and cq-deim = 3.467286 x 10“^. 

To illustrate that this is the usual behavior, i.e., the Q-DEIM selection performs 
as well as the original DEIM selection, we test other r = to values as well: 

r = TO = 4 : cdeim = 4.291788 x 10 eq-deim = 3.446203 x 10 ^ 

r = TO = 6 : eoEiM = 3.300680 x 10"^ eq-deim = 3.260097 x lO'^ 

r = TO = 7 : cdeim = 2.998979 x 10 cq-deim = 3.010827 x 10 


To better illustrate the comparison, we measure, in relative 2-norm, how accurately 
each entry of x{t) is reconstructed with DEIM and Q-DEIM. Therefore, we measure 
the relative 2-norm distance between the rows of the original snapshot matrix X 
and those of the reconstructed ones FXdeim and FXq.deim- The results are shown in 
Figure 2.2 once more, illustrating that both procedures perform equally well. 


2.2.2. Nonlinear RC Model. This is a model of nonlinear RC-ladder circuit 
another benchmark example for model reduction. The nonlinearity results 
from resistors that are in a parallel connection with diodes; the diode 1-V character¬ 
istics have the nonlinearity io = — 1 where i_D is the current through the diode 

and vd the voltage across it. The input is the current source entering at node 1. 

We take n = 1000, i.e., connect 1000 such ladders and excite the system using 
the exponential forcing g(t) = e“b A numerical simulation over t = [0,7] seconds 
results in 1425 POD snapshots Xi and 1425 nonlinear (DEIM) snapshots i{xi). Decay 
of the POD and DEIM singular values are shown in the left-hand side plot of Figure 
|2.3[ Based on this decay, we pick r = to = 10 and apply POD with both DEIM and 
Q-DEIM selections. In this example, the voltage at node 1, i.e., the first component of 
x{t) (denoted by is the quantity of interest and we measure how both reduced 


^The model can be downloaded from Max Planck Institute Model Reduction Wiki page at 
http://morwiki.mpi-magdeburg.mpg.de/morwiki/index.php/Nonlinear _RC_ Ladder 


11 







0.1 

0.05 

0 


Error in the snapshot reconstruction 



200 400 600 800 1000 1200 1400 1600 1800 2000 



200 400 600 800 1000 1200 1400 1600 1800 2000 






r=m=7 



nriM 



- Q-DEIM 

_1 







^*****^ 1 




200 400 600 800 1000 1200 1400 1600 1800 2000 
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Fig. 2.2. Example |2.2.1| Comparison of the relative error in the snapshot reconstruction for 
the F-N model for different r and m values. The horizontal axis variable k corresponds to the A:’'*' 
row of x{f) for which the relative error is computed. 


models approximate As shown on Figure 2.3 both methods perform extremely 
well; the reduced-model quantities are virtually indistinguishable from the original. 


As in the previous example, we compute the reconstruction errors due to DEIM 
and Q-DEIM, and obtain cdeim = 8.603826 x 10“^ and eq-deim = 6.07172 x 10“^; once 
again high accuracy for both models. The reconstruction errors in ^i(t) due to DEIM 
and Q-DEIM are 1.28183 x 10“"* and 7.783045 x 10“® respectively. Once we increase 
r = m = 10 to r = m = 20, both reduced models become even more accurate with 
edeim = 1.970500 x 10“^ and eq_deim = 1.931018 x 10“^. The reconstruction errors 
in ^i(t) are now 3.209967 x 10-’^ for DEIM and 3.238549 x 10-^ for Q-DEIM. 



Fig. 2.3. (Example \2.2.2\l The left-plot shows the decay of the POD and DEIM singular values. 
The right-plot shows the reconstruction accuracy of ^i(t) by both selection methods. 
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To further illustrate the dependence of the error on the reduced dimension, we 
test both DEIM and Q-DEIM for 1 < r = m < 20. The results depicted in Figure [2^ 
below confirm the earlier observations. 



Fig. 2.4. 


fExamplel2.2.2j) Relative errors eQ^iM and eQ_QEiM for varying r and m values. 


The two other excitation selections suggested for this model are g(t) = sin(27r50t) 
and g(t) = sin(27rl000t) [T7] . For these two inputs, for all the r and m combinations 
we have tried, DEIM and Q-DEIM have returned exactly the same selection matrix S; 
resulting in exactly same reduced model that is very accurate, with relative errors of 
0(10“^) even with r = m = 5. For brevity, we omit the resulting figures. 


3. Using restricted/randomized basis information. The framework intro¬ 
duced in fallows various modifications. Here we describe one, introducing Q-DEIMr, 
a version of Q-DEIM that works only on a random selection of the rows of U. This 
introduces the techniques of randomized sampling in the theory and practice of DEIM, 
but it also provides a new approach to sampling from orthonormal matrices m- 
If the dimension n is large, it would be advantageous to determine a good selection 
operator S in a way to reduce the 0{nn?n) factor in the complexity of the algorithm. 
From the proof of Theorem 2.1 it follows that, for the quality of the DEIM projection, 
we only need to ensure that the upper triangular matrix T = R(1 :m, 1 :m) has small 
inverse. But T is the pivoted QR triangular factor of certain columns of W, and our 
initial task is to find their indices; we have no interest in the QR factorization as 
such. This immediately suggests that we may attempt to find such indices using only 
a small selection of the columns of W (i.e. of the rows of u)g 

The following scheme depicts the main idea: suppose we randomly sample k > m 
columns of W (marked by |) and assemble them in a local working m x k array L. 
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(3.1) 


Then we attempt QR with column pivoting on L with a built-in Incremental Condition 
Estimator (ICE) that at any step j very efficiently estimates the norm of the inverse 
of the thus far constructed part of the triangular factor L(I: j, I: j)Q 


^In another situation, e.g. in a case of gappy POD approximation, we may want to avoid some 
rows of U because e.g. they correspond to spatial coordinates with corrupted or missing information. 
“^Here we assume that the triangular factor overwrites the corresponding part of the array L. 
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This is done as follows: Suppose the first j — 1 steps have been successful and the 
computed triangular factor ((o*) in (3.2)) is well conditioned. The corresponding 
global column indices (in W) that correspond to these columns in L are stored and 
removed from the active set of indices from which random selection is made. All 
Householder reflectors used are accumulated in a m x to matrix S. 

In the jth step, a new pivot column is selected and swapped to be the jth one, 
and the single Householder reflector is applied only to that column to annihilate its 
positions j + 1 to to, and to compute the {j,j)th position, (® in ( |3.2| )). At this moment 
we have all the ingredients to compute the value = ||L(1 : j, 1 : (marked 

/ * * X \ I 

as ( p p ^ ) ill (3.2)). If is below given threshold, the factorization continues by 
accepting the pivotal column, completing the jth step and looking for the next pivot. 
If not, it means that the {j,j)th position, (@ in (3.2)) is too small, and, due to 
pivoting, that all entries in the active submatrix of L (© in (3.2)) are also small. 
In that case, the columns j to fc in L are useless for our purposes and we discard 
them and draw new k — j + 1 columns from the active set of columns of W ( J in 
(3.1)). Before using newly selected columns as a part of L, we need to update them by 
applying unitary matrix S which contains accumulated all previous transformations. 
The resulting new columns in L (* in (|3.2[)) can now participate in pivoting. 
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(3.2) 


At this point we may simply choose to continue with the factorization, search for the 
next pivot column, swap it to the jth position, test "fj against the threshold and accept 
if it passes the test, as illustrated in (3.2). Another option is to discard all previous 


pivoting and start a completely new one by determining the largest column for the 
first position, and proceed with a completely new pivoting process on the updated 
contents of L. The value of k is not necessarily fixed and may change dynamically 
with a safety device to prevent failure. With proper data structure, one can develop a 
detailed algorithm and an efficient software implementation. For the sake of brevity, 
we omit the details. However, we provide one illustrative example. 

Example 3.1. Let f(t;p.) = 10e“'^‘(cos(4p,t) +sin(4/xt)), l<t<6 , 0</j,<7 r. 
Take 40 uniformly sampled values of fj, and compute the snapshots over the discretized 
t-domain at n = 10000 uniformly spaced nodes. The best low rank approximation of 
the sampled 10000 x 40 returned U with to = 34 columns. This indicates that the 
POD basis has captured the function’s behavior. We allowed Q-DEIMr to process only 
k = m columns in the work array L, and set the upper bound for c alj^-yfro-s/*^ — to + 1 . 
Column index drawing is done simply: £ “random” indices are taken as the £ leading 
indices of randomly permuted active set. We stress here that the purpose of this 
example is to illustrate the idea and its potential, to motivate further study of the 
randomized sampling approach to DEIM projection. 

After processing 113 rows of U (out of 10000), Q-DEIMr selected a submatrix 
with c « 181.45; DEIM processed the whole matrix U and returned c « 79.13. To 
test how well the two methods approximate f, we compute its value at 200 points in 
the /r-interval: for each Hj the function is evaluated over the i-grid giving G K". 


^This ad hoc choice is motivated by the structure of the upper bound for c, as in the proof of 
Theorem 2.1 Note that we use yGn instead of the worst case theoretical 0(2"*) bound. 
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The same is done with DEIM and Q-DEIM projections giving and 

respectively. The results of a comparison are depicted in Figure |3.1[ Since at this 



Fig. 3.1. (Example |3.1p Comparison of the approximation errors of DEIM and Q~DEIMr 
for f{t; fi) = 10e“^^(cos(4/i£) + sin(4/it)). Left figure: The function evaluated at p = 1.2787. Right 
figure: The relative errors 11/°™-/^, h/Wfuj h, h/\\fuj h f°'>' 200 uniformly spaced 

values of pj £ [0, tt]. Q-DEIMr used 113 rows (at most 34 at the same time) of U to make a selection; 
DEIM used all 10000 rows. 


computed errors in approximating 





computed errors in approximating f(t,/i) 





Fig. 3.2. 


(Example 3.1) Comparison of the relative errors \\f^^'^- 


ffiLf 112/11//^, • II2 


11/®,“'“- 


/mj II 2 /II//J 3 II 2 for = lOe '^*(cos(4^t) + sin(4/it)). Left figure: Upper bound in Q-DEIMr set 

to m^/n — m 1; it used 53 rows with c Ri 2532.9. Right figure: Upper bound in Q-DEIMr set to 
y/m\Jn — m + 1/5; it used 220 rows with c r: 103.1. In both cases, DEIM used all 10000 rows of U 
to make a selection, and Q-DEIMr was allowed to process at most 34 rows of U at the same time. 


point no sophisticated sampling strategy is used, the results may vary, depending on 
n, m, and the given upper bound for c. Figure |3.2| illustrates how the prescribed 
upper bound for c changes the execution and performance of Q-DEIMr. By visiting 
only 220 rows, we fully recover the accuracy achieved by using all 10000 rows of U. 

Remark 3.2. The technical details of using ICE in the above procedure are 
similar to the rank revealing method with windowed column pivoting m- However, 
the overall procedure is substantially different in spirit. The difference is that our 
objective is not to compute a rank revealing QR factorization (we know that W is of 
full row rank, WW* = and we do not even need its QR factorization), but just to 
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find a well conditioned submatrix. This allows to touch only a selection of columns 
of W and exit when a sufficiently well conditioned submatrix is determined. 

Remark 3.3. The quest for a well-conditioned submatrix of W can be obviously 
parallelized, and many processors can independently work on different (not necessarily 
disjoint) subsets of column indices, with no need whatsoever to engage in communi¬ 
cation, until one finds suitable columns and sends a halt signal. If more than one 
selection is found, the best one will be chosen. 

The column selection procedure can be improved at a cost of one pass through the 
array W to compute the column norms Ui = ||W(:,*)|| 2 . Such additional information 
can be used e.g. in the following two ways: 

• Select from the sorted columns in batches, as needed, starting from the largest 
ones. The norms of the columns in the active set can also be down-dated, using the 
procedure from column pivoted QR with numerically robust implementation |2()j . 

• Define pi = a;|), i = Then (pi,...,p„) is a probability 

distribution that can be used to draw column samples. It prefers larger columns. 
Each column is used only once, and the distribution is computed for the active set. 

This opens a completely new aspect of DEIM and establishes its connection to 
randomized numerical linear algebra, in particular with randomized sampling of rows 
of orthonormal matrices |3D]. In particular, one can view Q-DEIMr as a guided ran¬ 
domized sampling algorithm for orthonormal matrices. Detailed analysis of blending 
the two procedures is omitted here; it will be available in our subsequent work. 

Remark 3.4. Our experiments with randomized selection indicate that merely 
picking m random interpolation indices will not work in general and sophisticated 
strategies of DEIM, Q-DEIM, Q-DEIMr are necessary for a reliable and robust black-box 
procedure. Indeed, this is already revealed in Example |3.1| the initial random selection 
is not enough and Q-DEIMr brings in additional rows to process. Clearly, the results 
depend on the function being approximated but just to illustrate the importance of the 
selection principle of DEIM and Q-DEIMr, below we show the reconstruction accuracy 
of DEIM and randomly selected indices (with few trials to select better rows, but 
without the condition number control as in Q-DEIMr) for two nonlinear parametrized 
functions. As the figures illustrate, the random selection without the techniques from 
the DEIM procedures may perform very poorly. 



Fig. 3.3. (Remark |3.4p The reconstruction accuracy of a plain random selection of indices, 
as comp ared to Q-DEIM and Q-DEIMr. The first plot uses the data generated by the function from 
Example 13 The function used for te second plot is /(x, p) = smh{(p * cos]i(p/x))), 0.1 < a: < 6, 
0 < p < TT, and m = 11 out of n = 2000 indices are selected. 
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4. Conclusions and Future Work. Using the tools from QR factorization 
with column pivoting, this paper has introduced a new DEIM index selection strategy, 
that is invariant under orthogonal transformations, with a sharper error bound for 
the DEIM projection error. The new approach, called Q-DEIM, is tested on several 
numerical examples and it performs as well as the original DEIM selection procedure. 
For the cases of large dimensions, a modification is proposed that uses only randomly 
sampled rows of the data matrix yet still leading to high-fidelity approximations. 

In addition to the nonlinear model reduction and parametrized function settings 
we presented here, the new Q-DEIM selection is well suited for several important 
applications that we are currently investigating for our subsequent work. One such 
application is randomized sampling of rows of orthonormal matrices for solving the 
least-squares problems effectively in the cases with huge row dimension. The sec¬ 
ond one is the DEIM induced CUR factorization recently introduced by Sorensen and 
Embree |41] . A third application arises in nonlinear inversion and parametric model 
reduction, see, e.g., um, where an affine decomposition is needed for a parametric 
matrix A{p) € for efficient online model reduction step. This is usually han¬ 

dled by vectorizing A{p) which might lead to very large row dimension depending 
on the sparsity pattern of A{p). We are currently testing Q-DEIM and Q-DEIMr in 
these settings. Further, both DEIM and Q-DEIM (including Q-DEIMr) can adopt an 
updating scheme when the nonlinear snapshot basis is obtained by the SVD: one can 
enlarge m and U incrementally to yield better approximations. In Q-DEIM, increasing 
m will require updating a rank-revealing QR-decomposition; thus finding an efficient 
updating scheme for this incremental implementation will prove very useful. 
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